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ABSTRACT 

The mobility behavior of human beings is predictable to a 
varying degree e.g. depending on the traits of their personal¬ 
ity such as the trait extraversion - introversion: the mobility 
of introvert users may be more dominated by routines and 
habitual movement patterns, resulting in a more predictable 
mobility behavior on the basis of their own location history 
while, in contrast, extrovert users get about a lot and are 
explorative by nature, which may hamper the prediction of 
their mobility. However, socially more active and extrovert 
users meet more people and share information, experiences, 
believes, thoughts etc. with others, which in turn leads 
to a high interdependency between their mobility and social 
lives. Using a large LBSN dataset, his paper investigates the 
interdependency between human mobility and social prox¬ 
imity, the influence of social networks on enhancing location 
prediction of an individual and the transmission of social 
trends/influences within social networks. 

Categories and Subject Descriptors 

H. 4 [Information Systems Applications]: Miscellaneous 

Keywords 

Mobile Homophily, Location Prediction, Social Network Anal¬ 
ysis, Influence Model. 

I. INTRODUCTION 

General (social) homophily refers to the tendency of hu¬ 
mans to socially connect to other individuals with similar 
personal properties [52]. If similarity is evaluated with re¬ 
spect to geographical distance (e.g. of the center of living), 
this tendency may be referred to as Propinquity [38]. If 
the similarity is evaluated with respect to locations visited 
and/or the temporal sequence of these visits, we speak of 
mobile homophily. More technically, for the rest of this con¬ 
tribution, mobile homophily will refer to similarity between 
users with respect to their mobility behavior. We investi¬ 
gate the relation between human social relations and mobile 
homophily with the ultimate goal of improving next loca¬ 
tion predictions for users using social network data. In a 
first part we focus on studying the correlations between so¬ 
cial relations and mobile homophily, using a large dataset 
from a location based social network (LBSN). Here, we es¬ 
pecially focus studying the effects of tie strength and dense 
sub-groups. In a second part we study in how far we can 
exploit these correlations for improving next location pre¬ 
dictions on the basis of data on user’s social relations. 


2. SOCIAL RELATIONS AND MOBILE HO¬ 
MOPHILY: RELATED WORK 

Social relations and geographic distance d exhibit many 
interesting interrelations. Propinquity has been studied in 
form of the probability of friendship relations as a function 
of d\ [48, 25, 59, 44, 8, 68, 69, 77]. Most studies find a 
power law relation p{d) oc with slightly different expo¬ 
nents. [77] find an inverse correlation between distance of 
centers of life of two users and the relative size of the over¬ 
lap of their immediate social relations. In contrast to that, 
[39] found that purely online (virtual) interaction between 
users may not be not strongly influenced by distance. The 
mutual influence of mobility and social tie strength has also 
been investigated by [78, 15] [69] show that a substantial 
share of new friendships may be predicted from co-location 
events of users. [?] were able to reconstruct a social network 
via an analysis of the motion patterns of the correspond¬ 
ing users. In view of social link prediction [47], [70] show 
that ~30% of new friendships are formed between persons 
that have visited at least one common location. Regard¬ 
ing mobile homophily, [18, 16] show that the probability of 
two users having social ties increases with more number of 
co-locations and reciprocally there is an increase in shared 
location of friends as time grows. [17, 58] investigate further 
relations between mobility behavior and social relationships, 
taking the category of the locations into account. [17] also 
regard the regularity of the mobility patterns. Using a large 
mobile phone based data-set with cell-tower-based localiza¬ 
tion granularity, [78] also investigate social ties and social 
tie strength in relation to the mobility patterns of the re¬ 
spective users. We will refer to elements of this study later 
in more depth. 

However, most of the studies do not explicitly focus on the 
influence of social relationships on next location prediction. 
Before we focus on the investigation of these influences in 
section 5 we will discuss the results of our study on the 
relation between social network and mobility behavior using 
our Foursquare data-set which we will now discuss. 

3. DATA-SET 

In contrast to studies using cell-tower- or Bluetooth-based 
location and co-location inference [50, 78], Location Based 
Social Networking (LBSN) platforms provide GPS-accurate 
and explicit declarations of visits of users to locations (“check¬ 
ins”) and more detailed data on the nature of these loca¬ 
tions. Gompare [15, 14, 28] for recent studies using LBSN 
data-sets. LBSN check-ins do not allow to directly detect 
social location visits but we may heuristically infer these co- 


location visits with an accuracy comparable to Blnetooth 
enconnters via their time-stamps. The time stamps of check¬ 
ins also allow to investigate e.g. the dynamics of influence of 
users on other users [31, 67] with respect to location behavior 
[60, 79, 7] and allow for assessments of the social influence 
on next location prediction. 

As an LBSN to collect a data-set we used Foursquare [1], 
providing fine grained locations with time-stamped check¬ 
ins and social networking services for users, allowing to con¬ 
struct a social network. We restricted the collection to all 
check-ins of all venues of the San-Francisco area because 
Foursquare has a large active user-base in that area and 
thus we assumed that the collected check ins represent a 
temporally, spatially and socially sufficiently dense cover of 
the actual mobility of the involved users. 

For the social network, we extract the friends and the 
friends of friends of all users who have made at least one 
check-in within the four month period (122 days) of data 
collection. We refer to the set of users how have generated 
at least 50 check-ins within the course of data collection 
period with the set of active users. We use the set of active 
users for investigating human mobility behavior, because of 
the availability of sufficient location data. Table 1 contains 
descriptive statistics for the data-set. The average degree 
is higher and the mean average path length is shorter for 
the social networks containing only the active users, their 
friends and their friends of friends, because the active users 
are on Foursquare since a longer period of time, otherwise 
they would not have generated more check-ins on average 
compared to all users (Table 1). 

The Foursquare social network is a hybrid social network 
containing both real world and online friends, thus, the av¬ 
erage degree of (96) in our dataset is lower compared to pure 
online social networks such as Facebook with an average de¬ 
gree of (190) [2]. Further, the mean average path lengths 
in both social networks with all users and active users are 
found to be 4.152 for our dataset and 3.8 for Facebook [2] re¬ 
spectively, which is also in line with Stanley Milgram’s small 
world phenomenon [53, 75[. 

The clustering coefficients of both social networks with 
all users and only active users are 0.104 for our dataset 
and 0.1438 for Facebook respectively. The clustering co¬ 
efficient is a good indicator of the existence of a real social 
network among a set of users and their relationships [82]. 
In order to assert that the social network induced by the 
Foursquare users represents a valid social network, we com¬ 
pared its clustering coefficient with the clustering coefficient 
of a randomly generated social network with the same num¬ 
ber of ties and the same average degree using the Poisson 
random graph method presented in [55]. The clustering co¬ 
efficient of the random graph was found to be 0.0002 on av¬ 
erage, which considerably lower compared to the clustering 
coefficients of the Foursquare social networks. The numbers 
point to the conclusion that the social network induced by 
the Foursquare users can be assumed to indeed represent a 
valid social network. 

The check-in statistics for the active users show that on 
average the locations are visited by many users (21.36), and 
each user visits many locations (62.35) with a very low fre¬ 
quency (2.04) with an average degree of repetition of (1.04), 
which means most of the locations are publicly accessible. 
We calculated entropy values for each user to visit location, 
and each location to be visited by users. The average user 
and location entropy for the active are found to be (3.48) 


Quantity 

all 

active 

^ nodes (users): U 

141,750 

9173 

# edges (ties): tu G U X U 

5,327,041 

618,970 

Av. degree: Ujtjj 

37.58 

67.59 

^ nodes (users + friends): UF 

1,747,783 

261,780 

^ edges (ties): tupGUFxUF 

74,585,447 

25,293,730 

Av. degree: UF/tuir)' 

42.67 

96.62 

^ nodes (users + friends + 
fof): UFF 

7,954,935 

1,155,324 

Mean average path length 

4.152 ± 0.58 

3.8 ± 0.57 

Clustering coefficient [81] 

0.104 

0.1438 

^ locations visited by U 

30,630 

26,780 

^ check-ins by U 

1,983,772 

1,164,085 

Av.^ check-ins per user and day 

0.12 

1.04 

Av.^ check-ins per location 

64.77 ± 436.17 

43.47 ± 139.72 

Av.^ check-ins per user 

13.99 ± 37.57 

126.90 ± 84.28 

Av.^ locations per user 

8.71 ± 17.45 

62.35 ± 31.99 

Av.^ users per location 

40.33 ± 282.80 

21.36 ± 71.11 

Av.^^ check-ins per user 
and location 

1.61 ± 3.35 

2.04 ± 4.74 

Av. degree of repetition 

0.61 ± 3.35 

1.04 ± 4.74 

Av. user entropy 

1.15 ± 0.99 

3.48 ± 0.71 

Av. location entropy 

1.73 ± 1.60 

1.66 ± 1.57 


Table 1: Descriptive statistics of Foursquare data-set 


and (1.66) respectively, which substantiates the finding that 
the locations are rather publicly accessible. 

4. CORRELATIONS BETWEEN SOCIAL PROX¬ 
IMITY AND MOBILE HOMOPHILY 

4.1 Measures of Social Cohesion 

Social cohesion can be calculated using different approaches 
depending on the social proximity measurements they rely 
on such as Neighborhood-, Distance-, Density- and Cluster- 
based measurements. We used three neighborhood-based 
measurements: 

Common Neighbors (CN): The number of common friends 
between two users. Social cohesion between two users is 
higher, the higher the number of their common friends ([78]). 

Adamic-Adar (AA)-. The measurement common number 
does differentiate between the common neighbors of two 
users. A user with a high degree (a popular user with 
thousands of friends) is a potential common neighbor of 
(n(n — l)/2) pairs of users. Adamic & Adair therefore use 
a normalized version of common neighbor CN. It penalizes 
the contribution of each neighbor Uk € CN{ui,Uj) by the 
inverse logarithm of their degree [6, 78]. 

Jaccard Coefficient (Jacc): Jaccard coefficient sets the 
number of common neighbors of two users in relation to 
the total number of friends of both users. The higher the 
ratio of common neighbors between two users compared to 
their total friends, the higher the social cohesion ([78]). 

Additionally we use one density based measurement, namely 
Degree of Cliquishness (DoC). DoC quantihes to which ex¬ 
tent the friends of two users build a cohesive group. The 
social cohesion between two users is higher if their friends 
are interconnected more closely. 

4.2 Measures of Mobile Homophily 






























Mobile homophily (proximity) refers to the extent of over¬ 
lap between the movements of two individnals [78]. We cal¬ 
culate mobile homophily between two individuals using fol¬ 
lowing measurements within the emphasize of spatial (first 
three) and spatial-temporal (last) overlap: 

Spatial Co-location Count (Col): a spatial co-location is 
a location visited by two users, but not necessarily at the 
same time. Col simply counts the cases in which two users 
visit the same location within a time frame of one week 
(Equation 1). 


Col{ui,Uj) = 


“i 

( 1 ) 


is the set of visits of user Ui and 0 is the Heaviside 

step function for two visits s[“*^ and of both users Ui 
and Uj to the same location I within a time frame of one 
week W ([78]). 

Spatial Co-location Rate (SCol): The probability that two 
users Ui and Uj visit the same location within a period of 
one week: 


SCol{ui,Uj) = ^ p^^^\h,t)*p^'^^\h,t) ( 2 ) 

Zfc gl 

where p^'^'\lk,t) is the probability of user Ui to follow 
user Uj to location R within a time frame of one week {t = 1 
week).SCol assumes the visits of both users to occur in¬ 
dependently ([78]), thus this measurement must show no 
correlation to the social proximity between the two users, 
otherwise the assumption is rejected and the visit of the one 
user Ui depends on the visit of the other user Uj. 

Spatial Cosine Similarity (SCos): This measurement refers 
to the degree of spatial overlap between the trajectories of 
two users disregarding the time of the visits, i.e. the co¬ 
presence of both users at the same location ([78]). 

Social Situation (s) Rate: is a measurements within the 
emphasize of spatial-temporal overlap. These measurements 
guarantee that the two users are present during the same 
time at the same location, meaning that both users are co¬ 
present ([46]). We assume that at least two users visiting a 
location within a time frame of one hour to be involved in¬ 
volved in a social situation. The mass s(ui,Uj) is calculated 
by counting the number of cases where two users m and Uj 
visit the same location within a time frame At, normalized 
by the total number of times when both users were observed 
within this time frame At of one hour (Equation 3). 


s(Ui , Uj ) — 


9(Af - [Tt..) -r („,)[) 

I U.J 

( 3 ) 

where is the set of visits of user Ui, is any visit 
of user Ui, 0 is the Heaviside step function for two simulta¬ 


neous visits and of both users Ui and Uj to occur 
within the time frame At at the same location 1. 

Additionally we use a weighted version of the above mea¬ 
surements using the following weighting factors: 

Location density: represents the density of other locations 
in the vicinity of a location h. Visits of two users to the same 
location h are weighted higher, the higher the density of 
other locations is in the vicinity of location h. The pragmat¬ 
ics behind this weighting schema is based on the assumption 


that in highly populated areas (which is assumed to have 
high density of specific public locations) a co-location event 
is less likely than in a sparsely populated area where the few 
specific locations act as focal points attracting people. 

Distance From Home Location: If users live within a short 
distance from each other, the probability of being co-located 
by chance only is much higher then compared to a situation 
where their center of life locations are located farther apart. 
We thus weight visits of two users to a location with the log¬ 
arithm of the distance between their home locations. Note: 
Due to the lack of information, we assume the center of the 
region with high check-in density as the home location of a 
user ([77, 56, 57, 69]). 

Location Population p{lk)'- People usually tend to pur¬ 
posefully meet important friends at locations with low pop¬ 
ulation such as their homes or small restaurants/bars in the 
vicinity of their homes, rather than at locations with high 
population such as subway station. Each visit to a loca¬ 
tion Ik is weighted as inversely proportional to the log of the 
population size \p{lk)\ at that location [78]. 

Location Entropy: The significance of user meetings can 
be assumed be higher at low entropic locations than at high 
entropic locations which can be rather be assumed to be 
public locations that many people frequently visit (such as 
large subway stations). Thus another weighting schema in¬ 
versely proportinal to the location entropy is introduced. 

H{1) =-^p{ui,l) \np{ui,l) (4) 

i 

4.3 Correlations 

Mobile homophily refers to the tendency of similar indi¬ 
viduals to be interested in the same locations [78]. An anal¬ 
ysis of correlation between social and mobile homophily was 
conducted in order to assert mobile homophily among so¬ 
cial connected individuals. Table 2 contains first evidences 
for mobile homophily as it shows higher mobile homophily 
among friends compared to random pairs. Friends share on 
average (4.29) locations, whereas random pairs share only 
(1.61) locations on average. Further, friends are involved on 
average in (5.74) social situations, whereas the correspond¬ 
ing number for random pairs is only (0.14). 


Type 

(f) Common locations 

(j) Social situations 

Friends: 

4.29 

5.74 

Random pairs 

1.61 

0.14 


Table 2: Friends have on average 2.5 times more common 
locations than random pairs of users and are involved in 
about « 40 times more social situations within 1 hour. 

Furthermore, a correlation analysis between mobile ho¬ 
mophily and social cohesion was conducted. We use 100 000 
randomly chosen sample pairs of users from the social net¬ 
work G with the whole population. We sample pairs of users 
using a simple random sampling with replacement method. 
The users in a sample pais are not necessarily socially con¬ 
nected. The hypothesis is that mobile mobile homophily 
correlated with social cohesion. We refer to certain intervals 
of the correlation coefficient with the following equivalences: 
> 0.7 corresponds to a very strong correlation, [0.4,0.7] cor¬ 
responds to a strong correlation , [0.1, 0.4] corresponds to 
a moderate correlation, < 0.1 corresponds to weak or non 
correlation. 










CN 

AA 

DoC 

Jacc 

Scos: 

-0.056 

0.008 

0.16 

0.004 

SCos-User: 

-0.058 

0.008 

0.15 

-0.017 

SCos-Dens: 

-0.052 

0.013 

0.155 

0.013 

SCos-Dist: 

-0.037 

0.014 

0.155 

0.024 

SCos-Entr: 

-0.048 

0.008 

0.155 

0.004 

Scot: 

-0.171 

0.021 

0.131 

0.016 

Col: 

-0.207 

0.023 

0.131 

-0.007 

s .• 

-0.066 

0.043 

0.112 

0.021 

5-Dens: 

-0.027 

0.048 

0.105 

0.022 

S-Dist: 

-0.01 

0.07 

0.084 

0.014 

S - User: 

-0.029 

0.043 

0.109 

0.014 

s-Entr: 

-0.029 

0.048 

0.108 

0.019 


Table 3: Pearson’s correlation coefficient r between mobile 
homophily and network proximity for 100 000 randomly cho¬ 
sen pairs of users (setting At = 1 hour for spatial-temporal 
overlap). 

The result of the correlation analysis is shown in Table 3. 
A slight correlation is noticeable for DoC and all mobile ho¬ 
mophily measurements, the remaining measurements show 
no correlation. The anti-correlation can be explained by 
measnrement artifacts of the data-set, because the location 
data is limited to a city (San Francisco), whereas the social 
network is global and contains users from the whole world. 
A reliable correlation might not be determined for many 
pairs of users due to the lack of location data. 

4.3.1 Impacts of Propinquity on Social Cohesion 

Tobler’s first law of geography states that ’’everything is 
related to everything else, but near things are more related 
than distant things” [74]. Propinquity refers to the tendency 
of individuals to have their ties with other in their geograph¬ 
ical vicinity [38]. For example, two users living in the same 
building have a higher propinquity than two users living in 
different buildings [24]. We constrain the social network to 
the active users and their friends and friends of friends (FoF) 
from San Francisco (home city) in order to investigate the 
effects of geographical constraints on the social network, in 
accordance to Tobler’s statement and the propinquity effect. 
We refer to the induced graph by Ghc- 


Type 

Clustering 

Coefficient 

0 Mean average 
path length: 

Av. 

Degree 

All: 

0.104 

4.152 ± 0.58 

42.67 

Active users: 

0.1438 

3.8 ± 0.57 

96.62 

Active home city 

0.38118 

3.6745 ± 0.55 

- 


Table 4: A comparison between the results of both cluster¬ 
ing coefficients and average shortest path for three different 
graphs induced by all users, active users and users from San 
Francisco respectively. 

The induced social network has a higher average degree 67 
compared to all users 37. The clustering coefficient of Ghc is 
found to be Gghc ~ 0.38, which is significantly higher than 
the clustering coefficients Cg ~ 0.104 and Cgau ~ 0.1438. 
The average shortest path increases from 4.152 ± 0.58 for G 
to 3.6745 ± 0.55 for Ghc- The signfficance of the change 
in the average shortest path is confirmed by a two-sided 
unpaired t-test p(e) = 1.09 * 10“^®^). The results show 
indeed that geographically close people form a closer knit 
network. A new correlation analysis using random pairs 


of users from Ghc substantiates the above findings. Ta¬ 
ble 5 shows how the correlation between mobile homophily 
and network proximity signihcantly increases. The signif¬ 
icance of the changes in CN, AA, Jacc and DoC is con¬ 
firmed by 3 two-sided unpaired t-tests (with p(e) values 
0.000601,0.0122&0.000124 respectively). 



CN 

AA 

DoC 

Jacc 

Scos: 

0.417 

0.225 

0.168 

0.315 

SCos-User: 

0.24 

0.185 

0.184 

0.327 

SCos-Dens: 

0.418 

0.221 

0.169 

0.32 

SCos-Dist: 

0.389 

0.154 

0.153 

0.273 

SCos-Entr: 

0.267 

0.192 

0.175 

0.313 

Scot: 

0.141 

0.101 

0.188 

0.25 

Col: 

0.309 

0.122 

0.188 

0.234 

s .• 

0.326 

0.242 

0.151 

0.25 

S-Dens: 

0.309 

0.281 

0.16 

0.341 

S-Dist: 

0.342 

0.272 

0.169 

0.353 

S-User: 

0.215 

0.231 

0.178 

0.349 

S-Entr: 

0.28 

0.261 

0.176 

0.35 

S-Extra Role: 

0.052 

0.163 

0.145 

0.351 


Table 5: Pearson’s correlation coefficient r between mobile 
homophily and network proximity for 100,000 randomly cho¬ 
sen pairs of users from Ghc (setting At = 1 hour for spatial- 
temporal overlap). 

4.3.2 Impacts of Cohesive Subgroups on Mobile Ho¬ 
mophily 

People typically, maintain their social relationships on dif¬ 
ferent scales due to cognitive, emotional, spatial, and tem¬ 
poral limits. Therefore, they interact mainly with a small 
group of their acquaintances ([84, 37] as cited by [39]), who 
together form a cohesive subgroup with a higher probabil¬ 
ity. From the behavioral-perspective, members of a cohesive 
subgroup share information and have homogeneity of inter¬ 
ests and beliefs[9], therefore they exhibit a high similarity in 
their behavior including mobile homophily. Different types 
of cohesive subgroups exist from the perspective of social 
network analysis such as cliques and n-plexes. The main 
property of cliques is the completeness of the ties inside the 
groups, where all members are connected to each other. N- 
plexes relax the completeness constraint of cliques by allow¬ 
ing for up to N missing connections for each group member. 

We use the algorithm proposed by [76] for enumerating 
maximal cliques based on a binary tree with n levels. We 
stopped the enumeration after having detected 8700 cliques 
with a minimum size of 3. The largest clique contains 17 
users. The completeness requirement of cliques makes them 
impractical for the detection of real-life cohesive subgroups, 
therefore we slightly relax the cliques to 2-plexes in order 
not to change the clique properties significantly. We detect 
29591 2-plexes with a minimum size of 3, the plexes contain 
on average 7.79 ± 2.41 vertices. The largest 2-plex contains 
20 vertices, which is in rough agreement with the human 
social perception limit in [25]. 

We repeated our correlation analysis by sampling 100 000 
pairs of users, where each pair is chosen from the same 2- 
plex. Table 6 compares the average values of all mobile ho¬ 
mophily and social cohesion measurements with the corre¬ 
sponding values of the previous correlation analysis between 
pairs of users from the same home city. All social cohesion 
measurements are as much as a factor of 50 and the mobile 













































homophily measurements as much as a factor of 23 higher. 
The results show a significantly stronger interdependence 
between spatial and social cohesion among the members of 
cohesive subgroups. 



Home 

city 

2- 

plex 

Scos: 

0.013 

± 

0.012 

0.11 

± 

0.101 

WSCos-Dens: 

0.014 

± 

0.012 

0.111 

± 

0.101 

WSCos-Dist: 

0.008 

± 

0.007 

0.055 

± 

0.051 

WSCos-User: 

0.008 

± 

0.006 

0.099 

± 

0.075 

WSCos-Entr: 

0.01 

± 

0.008 

0.107 

± 

0.09 

Seal: 

0.001 

± 

0.001 

0.009 

± 

0.007 

Col: 

1.626 

± 

1.27 

40.006 

± 

29.176 


0.058 

zh 

0.03 

4.538 

± 

3.284 

S-Dens: 

0.006 

± 

0.003 

0.158 

± 

0.124 

5-Dist: 

0.003 

± 

0.001 

0.054 

± 

0.042 

S-User: 

0.002 

± 

0.001 

0.044 

± 

0.035 

S -Entr: 

0.004 

± 

0.002 

0.131 

± 

0.103 

5-Extra Role: 

0.002 

± 

0.001 

0.054 

± 

0.037 

CN: 

0.193 

± 

0.103 

14.842 

± 

17.881 

AA: 

0.102 

± 

0.054 

7.804 

± 

10.526 

Jacc: 

0.005 

± 

0.002 

0.059 

± 

0.112 

DoC: 

0.002 

± 

0.001 

0.082 

± 

0.081 



r 

P 

P{<^) 

Scos.- 

0.45 

0.6 

0 

WSCos-Dens: 

0.43 

0.6 

0 

WSCos-Dist: 

0.5 

0.63 

0 

WSCos-User: 

0.46 

0.67 

0 

WSCos-Entr: 

0.51 

0.65 

0 

Scot: 

0.15 

0.43 

0.0006 

Col: 

0.22 

0.23 

0.0562 


0.48 

0.63 

0 

S-Dens: 

0.57 

0.73 

0 

S-Dist: 

0.66 

0.81 

0 

S-User: 

0.6 

0.78 

0 

5 -Entr: 

0.58 

0.79 

0 

S-Extra Role: 

0.32 

0.75 

0 


Table 7: The correlation between 2-plex cohesion measure¬ 
ment calculated according to equation 5 and all mobile ho¬ 
mophily measurements. The last column contains the p- 
value of the corresponding Spearman’s correlation coeffi¬ 
cient. The p-value indicates the probability that social prox¬ 
imity and mobile homophily have no relationship (page ??). 
The correlation between 2-plex cohesion measurement cal¬ 
culated according to equation 5 and all social network mea¬ 
surements 


Table 6: The average values for network proximity and mo¬ 
bile homophily measurements for 100 000 random pairs cho¬ 
sen from 2-plexes and 100 000 random pairs chosen from 
Ghc- 

Cohesion is a measure that allows the identification of 
more important subgroups among the set of all subgroups 
[41]. A more cohesive subgroup has more ties inside and 
fewer outside the subgroup. Given a cohesive subgroup and 
the adjacency matrix A, C{U) measures the degree of cohe¬ 
sion in a cohesive subgroup (Equation 5 [80] Chapter 7): 


C{U) 


'^-ViGU 

\U\(\V-U\-1) 


( 5 ) 


We use this measure ito investigate the impacts of groups 
cohesion on mobile homophily. Table 7 shows the results 
of a correlation analysis between the cohesion measurement 
according to Equation 5 and mobile homophily. Mobile 
homophily measurements based on spatial overlap (asyn¬ 
chronous) show a correlation coefficient varying between r = 
0.22, p = 0.23 for co-locations within one week and r = 
0.51, p = 0.65 for WSCos — Entr. Mobile homophily mea¬ 
surements based on spatial-temporal (synchronous) over¬ 
lap show higher correlation coefficient values, for example 
for social situation rates (s) the values vary between r = 
0.48, p = 0.63. The weighting factors used all led to higher 
correlation coefficients, with ’’distance from home” s — Dist 
proving to be the best weighting factor (ranging between 
r = 0.66,p = 0.81), which confirms that people are rather 
willing to cover greater distances, mostly in order to meet 
close/important friends. 

The relationship between group cohesion and mobile ho¬ 
mophily is shown on a log-log plots on figure 1 for mea¬ 
surements based on spatial overlap. The results show a 
more monotonic relationship than linear relationship, which 
explains the higher coefficient values for Spearman’s rank 
correlation coefficient. Moreover, the log-log plot for the 
spatial-temporal overlap shows a higher coefficient of deter¬ 



R^ = 0,6913 R' = 0,622 = 0,7452 = 0,6996 

=< Log SS Entropy * Log SS-Density Log SS-Distance x Log SS-User 


Figure 1: The correlation between the cohesion measure¬ 
ment according to equation 5 and different weighted social 
situation rates. 


mination = 0.74 than the square of the correlation co¬ 
efficients, which is an evidence for the non-linear nature of 
the relationship. We noticed a similar relationship between 
group cohesion and mobile homophily measurements based 
on spatial-temporal overlap. 

The correlation analysis shows reciprocal impacts between 
mobile homophily and social cohesion. People find others 
in their geographical vicinity more attractive for a social 
relationship on one hand side, on the other hand they share 
more locations with their closest friends (represented by the 
members of their cohesive subgroups). 

5. SOCIAL RELATIONS AND NEXT LOCA¬ 
TION PREDICTION 

5.1 Related Work 























































The correlation between mobile homophily and social co¬ 
hesion shows a statistical dependence between two events 
rather than a relationship of cause and effect, i.e. the occur¬ 
rence of one event causes the occurrence of the second event. 
The order of the occurrence of the events is significant in cau¬ 
sation in contrast to correlation correlation analysis where it 
is of little importance, because causation requires the cause 
to precede or coincide with the consequence. 

In general, three types of causation effect can be distin¬ 
guished, namely necessary (a cloud is necessary for rain¬ 
fall), sufficient (wind is a sufficient cause for the rustling 
of the trees) and contributory. According to the definition 
of [64] for contributory cause, and according to the INUS 
condition for contributory causes in [49], the influence of 
the mobility of users on the mobility of their neighbors is 
a contributory cause. A visit(cause) of a user(influencer) 
to a location, can cause (contribute to) a visit(effect) of 
friends(followers) to the same location, either the cause pre¬ 
cedes the effect (asynchronous influence, i.e. the infiuencer is 
absent during the visit of the friends) or it coincides with it 
(synchronous influence, i.e. the infiuencer accompanies the 
follower during their visit). 

The influence of social networks on predicting individual 
mobility has been investigated by [54, 67, 60, 35, 21, 20, 15, 
72]. 

[15] proposed two mobility models, namely Periodic Mo¬ 
bility Model (PMM) and Periodic & Social Mobility Model 
(PSMM) for predicting the locations of an individual. PMM 
models the mobility of a user as a time-variant stochastic 
process, where the temporal dynamics of human mobility are 
captured based on a day-specific periodic transition model. 
PMM uses a mixed Gaussian distribution centered around 
two states for predicting the future location of the user, 
namely home & work states, PSMM considers a third so¬ 
cial state for evening and weekend activities. Using PSMM 
improve the average distance error between the predicted 
and the exact location of the user from 2.9% to 2.7%, which 
corresponds to an improvement by 10%. 

Random Utility Decision Models (RUM) (or discrete choice 
models) are statistical procedures that predict the choice of 
a user among alternatives based on a utility function [51], 
which can take many factors into account. [35] proposed a 
work investigating the interdependence between social net¬ 
work and travel behavior based on RUM. It models a user’s 
utility for a location j based on both the travel cost from a 
start location to a destination location j and the social in¬ 
fluence of the destination location. The social influence of a 
location is calculated based on factors such as the number of 
friends at the location and/or number of friends-of-friends. 

[72] has analyzed topical influence and proposed an influ¬ 
ence model called Topical Affinity Propagation (TAP). For 
each topic they determined a set of representative members 
of a social network and investigated the influence of a mem¬ 
ber on their friends. The proposed Topical Factor Graph 
(TFG) incorporates both user-specific topic distribution and 
network structure into one probabilistic model. TAP learns 
the model parameters using the sum-product algorithm de¬ 
scribed in [43]. 

Mobility models based on Dynamic Bayesian Networks 
DBN has been proposed by [54, ?, 60]. The influence of 
groups on individual human movement has been investi¬ 
gated by [54]. The authors observed a strong influence of 
groups on individuals moving in and between these groups. 


An individual either moves independently from groups, or 
within the sphere of influence of a geographical group at 
any point in time. An individual joins a group based on the 
attractiveness (depending on interaction of the user with 
group members) of the group and the difficulty of reaching 
that group. [67] has proposed a mobility model taking into 
account both temporal and social dependencies based on a 
DBN approach. The model is evaluated using publicly avail¬ 
able GPS-tagged tweets from two different areas (LA and 
NY). [60] have proposed a general social influence model 
that can be applied to any interaction network in any so¬ 
cial system (including social networks like Foursquare, Face- 
book, etc.). Their general social influence model is based 
on a simple mixture approach with fewer parameters than 
the Hidden Markov models called the dynamical influence 
model. They proposed two models, a model with a static 
tie strength matrix, and a more dynamic extended model 
that uses a set of different tie strength matrices for captur¬ 
ing dynamic changes over time. The authors use a switching 
latent state variable which controls the current tie strength 
matrix to be used. The dynamic influence model captures 
how the state of one user is influences by the state of their 
neighbour. 

5.2 Social Context and VOMM-Based Loca¬ 
tion Prediction 

We have introduced in a previous work [10] a mobility 
model using an adapted general-purpose algorithm called 
Prediction by Partial Matching (PPM) [13], which is based 
on variable markov models (VOMM). The proposed mobility 
model considers both spatial and temporal context while 
predicting the next location of a mobile user [10]. PPM, 
in contrast to Bayesian Network (BN) models, predicts the 
value of a random variable based on a subset of random 
variables of variable size depending of the specific realization 
of observed variables in the training data called context (s). 
The size of the context n = |s| represents the order of the 
model. The variable order of PPM alleviates the negative 
impacts of missing data and zero-frequency due. VOMM 
approaches generally use a tree structure to alleviate the 
problem of transition matrix sparseness. 

Given the current context s, PPM estimates the prob¬ 
ability of a location q to be visited after s by estimating 
the conditional probability p{q\s). PPM assigns a probabil¬ 
ity mass P{escape\s) to symbols that does not appear after 
context s, the remaining probability mass 1 — P{escape\s) 
is distributed among the symbols appearing after s. Equa¬ 
tion 6 determines the probability of any symbol q occurring 
after context s recursively. 

f Hlls) iiqe^s .g. 

\ P{escape\s) P{q\sui{s)) else 

where Es is the set of symbols appearing after context s, 
suf(s) denotes the longest sufhx of s. The probability of any 
symbol appearing after an empty context |s| = 0 is P{q\e) = 
j^. Hence, PPM is able to assign a probability mass to 
any symbol independently of its occurrence in the training 
sequence, thus PPM does not contravene Gromwell’s rule 
(compare with the rule of succession in general statistics 
[85]). For symbol q and context s, let C{sq) be the counter 
that counts the occurrences of sq, Equation 7 and Equation 8 
are estimations of both probabilities P(g|s) and P(escape\s) 


respectively: 
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We integrate the temporal context into the PPM model 
making use of the inclusion sematic of human kind peri¬ 
ods (Week > day > honrs), which has the same hierarchi¬ 
cal structure as the PPM tree [10]. For example, a spa¬ 
tial context s can appear in temporal contexts of the form 
{Workingday, Tuesday, 7pm). 

5.2.1 Integration of Social Context 

The Socio-Spatial-Temporal (SOST) PPM is an improve¬ 
ment of the spatial-temporal PPM model that incorporate 
social influence factors. Generally, we distinguish between 
two types of social influence factors, namely synchronous 
specific and general social trend influences. A Synchronous 
specific influence factor represents the cases when a user and 
a set of their friends are involved in the same social situation 
(when they visit a restaurant for example). We refer to the 
friends from whom the influence originates as influencers. 
Thus, synchronous social influence has two preconditions, 
first the user must be currently involved in a social situa¬ 
tion, second the availability of location histories of friends. 
The introduce Synchronous specific influence factors in more 
details in the next subsection. 

General social influence factors represent the general move¬ 
ment patterns and social trends in a user’s community, for 
example the favorite bar or club of their circle of friends, a 
hip new restaurant in the city or an inexpensive shopping 
mall, etc. The users in the same circle of friend or the same 
community share common interests, hobbies, thoughts, be¬ 
liefs, etc. which precipitate interest in common locations, 
or similar movement behavior in the same spatial-temporal 
context. General social influence has only one precondition, 
namely the availability of location histories of friends. 

A precondition of synchronous specific social influence fac¬ 
tors in the presence of the users at the same location dur¬ 
ing a short period of time At. The correlation analysis has 
a shown a moderate to a strong correlation between mo¬ 
bile homophily and social cohesion when setting At to one 
hours, therefore we assume that friends, who are present at 
a location within one hour to be involved in the same social 
situation. 

The Social context of a visit to a location from the point 
of view of a user Ui is a tuple s =< U,q,X,t,c >, where 
U is the set of users present at location q representing a 
synchronous specific social influence factor, A represents the 
set of temporal features (such as work day or weekend, day 
of week, hours of day etc.) extracted from the time stamp t 
of the visit and c is a counter that bookkeeps the occurrence 
of that specific social influence factor. Let user Ui be the user 
whose next location is going to be predicted and N{i) the 
neighbors (friends) of Ui, then [/ is a subset of t/ C N{i) U 
Ui. According to the users present in U, the social context 
of a visit can be categorized into three different classes of 
synchronous specific social influence factors s: 

• Class I social influence factors - is a social situation 
that contains the user Ui and at least one of their 
neighbors, i.e. Ui is visiting a location with at least 
one of their friends. 


• Class II social influence factors - is a social situation 
that contains at least two neighbors of the user u; with¬ 
out the presence of Ui in the social situation. 

• Class III individuals based social influence factors - 
Contains single visits of the neighbors without the pres¬ 
ence of other users. 

The inclusion of class II & III social influence factors in¬ 
jects a vast amount of extra knowledge into the mobility 
model of an individual. It helps predict locations which 
have been visited by friends even if the user themselves has 
never been there before. The prediction of locations where 
the user has never been before is almost impossible if only 
the location history of the individual user is available (but 
may be possible to a limited extent if additional data sources 
such as their personal calendar are considered). 

The probability a user visits a location under considera¬ 
tion of the spatial-temporal context and synchronous specific 
social influence factors is captured by the conditional prob¬ 
ability P{q\U, s, A). We assume that the current social situ¬ 
ation U is independent from the spatial context. Equation 9 
is an estimation of the the probability mass P{q\U, s, A): 

P{q\U,s,\)= P(M^ * P(g^ (9) 

Social influence Individual mobility 

The right term of the equation represents the probability 
of visiting a location given both spatial and temporal con¬ 
texts, which can be estimated from the individual location 
history as in the previous chapters. The left term repre¬ 
sents influences arising from the current social situation and 
temporal context of the user. 

The inclusion of location histories of friends causes an ex¬ 
plosion in the number of locations in the alphabet E, thus 
we manage social influence factors in a separate tree called 
Socio-Spatial-Temporal (SOST) tree. Table 8 shows the fea¬ 
tures and their domains, that are used in the SOST PPM 
VOMM tree. 


Variables 

Domain 

Description 

^loc 

{ll, ^2: 

The set of locations visited 
by the user 

w 

{Wd. We} 

a binary variable representing 
whether it is a weekend day 
or a work day 

D 

{Sun, Mon, Sat} 

The day of week 

gAt 

{Si,S2,...,S,} 

The number of time slots 
calculated by dividing the 
hours of day by At, 
setting At = 1 means 
that each hour of day 
represents a slot 

U 

{lil, U2, ...,Uj} 

The set of users present in 
the current social situation 


Table 8: The features included in the SOST PPM VOMM 
model. 


The nodes of the SOST tree corresponds to either loca¬ 
tions, or temporal features from the location histories of a 
user Ui and their friends N{i). Each node of SOST tree 
manages a set of tuples of form < U,t,c> for the social sit¬ 
uations in the location histories of the friends at the given 
spatial-temporal context associated with the node. U rep¬ 
resents the users involved in a social situation, t the time 
stamp of the latest occurrence of U and c is a counter for 
bookkeeping the occurrence of U. Figure 2 shows an ex¬ 
ample SOST PMM VOMM tree and zooms into a node in 













order to illustrate how SOST PMM manages different social 
influence factors. 



with ID = 23, the nodes immediately under the root node 
are labeled with locations, the nodes at deeper levels are 
labeled with temporal features such as work and weekend 
days and time slots of day. Unlike the PPM VOMM tree 
on figure (??), each node in the SOST PPM VOMM tree 
has multiple tuples for managing the occurrence of social 
influence factors. The figure zooms into the red node in or¬ 
der to illustrate how SOST PPM VOMM manages the three 
classes of social influence factors at location q\ on weekend 
days. Each social influence factor is a tuple consisting of a 
set of users who build together the social influence factor 
(the numbers in curly braces represent the IDs of the users), 
the time stamp of its latest occurrence and a counter (the 
number in parenthesis) for bookkeeping the nnmber of its 
occurrences. Social influence factors of different classes are 
colored differently. 


Upon detecting a social situation influence factor, we tra¬ 
verse the SOST tree to find a corresponding node according 
to the current location q and temporal context A of each 
of the social influence factors. We insert a new path if a 
corresponding node does not yet exist in the tree (necessary 
for class II & III social influence factors). We initialize a 
new counter for a social influence factor that occurs for the 
first time. The integration of social context into our mobility 
model simply corresponds to adding new paths to the SOST 
VOMM PMM tree, incrementing or initializing counters for 
each different social influence factor. SOST PPM VOMM 
increments the occurrence of a synchronous specific social 
influence factor according to Equation 10 


Cj,(q,A) = C^-(g,A)(^(f/,t) + l) (10) 

where the factor tp{U,t) represents the degree of drift of 
the social situation U, because Social situations are sub¬ 
ject to decay over time. Members of the same social group 
exhibit similarities in their beliefs, interests, hobbies, goals, 
activities, emotional needs, feeling of security etc. Group af¬ 
filiation and aforementioned similarities are subject to decay 
overtime, thus the influence of previous social situations de¬ 
creases the longer their last occnrrence is in the past. SOST 
tree uses two different functions for estimating the degree of 
drift tp{U,t): 


= ( 11 ) 

^2([/,t) = (12) 

where /3 is a hyper-parameter that controls the degree of 
drift and U a synchronous specific social influence factor and 
ti < t is the time stamp of the last occurrence of U. The 
unit of the factor t — ti is the average stay time At in hours. 
If a user spends on average three hours at a location, the 
social influence factor decays every three hours by a factor 
of 1-/3 or e”'^. 

5 . 2.2 Synchronous Specific Social Influence Estima¬ 
tion 

SOST PPM VOMM has to estimate the social influence 
part P{q\U, A) of equation (9) in order to predict the future 
location of a user using synchronous specific social influence 
factors and both spatial and temporal features. Social sit¬ 
uations do not have an order of occurrence like the spatial 
context, and do not follow an inclusion semantic as the tem¬ 
poral context, therefore the standard escape mechanism of 
PPM is not applicable. SOST PPM instead uses a similarity 
measurement based on Jaccard Similarity Coefficient (JSC) 
[4] for comparing two different social influence factors. JSC 
is defined as the size of the intersection divided by the size of 
the union of two sets of users involved in two social influence 
factors (13): 


Jacc([/, U) 


^ujeunu Wj) 
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(13) 


where T(Mi, Uj) is the tie strength between the users Ui and 
Uj. We calculate tie strength between two users using the 
amount of spatial overlap between their movement histories. 
Let N{i) be the set of neighbors of user Ui, SOST VOMM 
PPM calculates tie strength based on spatial overlap only 
between user m and a friend from the set N{i) according to 
equation 14: 




'^u,,eN{i) Coll{Uk)u!{l) 


(14) 


uj{l) is a function that weights visits to location I according 
to weighting factors such as the distance from home, the 
density or the entropy of a location. 

SOST PPM uses Equation 15 to determine the amount of 
influence arising from past synchronous specific social influ¬ 
ence factors at a location on the current visit of a user to 
that location by modifying the counter of the node rj corre¬ 
sponding to the current spatial-temporal context (g. A). 


C^=(U,X,< 1 ) = ^^^{U,x,q) * Jacc{U, U) (15) 

u 

where U is set of users involved in the current social sit¬ 
uation. The counter for the occurrences of a syn¬ 

chronous specific social influence factor with a set of users U 
at location q and temporal context A. SOST PPM makes use 
of Equation 15 for estimating the probability mass P{q\U, A). 
Let c be the set containing a node rj = (q, A) corresponding 
to a location q and temporal context A and its ancestors, 
(Equation 16) estimates the probability mass P{q\U, A): 
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(16) 


Equation 17 represents another possibility to estimate the 
probability mass P{q\U, A) by changing the denominator of 
equation (17) 


P{q\U,X) = (17) 

2^u '~'s=(u,v) 

The denominator in Equation 17 is smaller than the de¬ 
nominator in Equation 16, because it depends only on the 
node rj and not its ancestors. Thus Equation 17 gives syn¬ 
chronous specific social influence factors more importance 
than Equation 16. 

5.2.3 General Social (Trend) Influence Estimation 

A user might be under social influence even without the 
presence of observable influencers. Social trends in the com¬ 
munity of a user (for example a hip bar or restaurant) or 
asynchronous social influences (for example when friends 
recommend/share locations to/with each other using other 
media like e-mail, phone, online social network platforms 
etc.) are examples of general social influences. The inclu¬ 
sion of general social influence in a mobility model is difficult 
because influence arises from an unobservable subset of the 
community (the neighbors N{i)) of the user and is transmit¬ 
ted with a time delay ranging between few hours to even few 
weeks. Therefore, we consider general social influences only 
in cases where the predicted location has a smaller prob¬ 
ability than the probability of an unknown location (the 
probability P(escape\s) in Equation 8 after escaping to an 
empty context s = e). We use beside SOST PPM an addi¬ 
tional (general) VOMM tree (M') in order to integrates the 
trajectories of all friends. The extended mobility model pre¬ 
dicts the future location of a user m in two steps. The SOST 
tree is first used to predict the next location of the user m. If 
the probability of the predicted location is greater than the 
escape probability, it returns this location and terminates, 
otherwise it switches to the new general VOMM tree M' in 
order to predict the next location of the user (Equation 18) 

_ J argmax {P{q\U, s, X)), P(x\U, s, X) > P{escape\s) 

~ ( argmax P'{q\s,X)M', else 

(18) 


5.3 Results 

We evaluate the performance of the SOST PPM model us¬ 
ing the Foursquare data-set from section 3. We define three 
different limits of predicability in order to asses the perfor¬ 
mance of SOST PPM. The users visit on average \L\ = 62.35 
locations, the average entropy is found to be 3.48, which 
means the average minimum number of location necessary 
for producing an average entropy of 3.48 is = 32.46, 
which corresponds to a Lower bound of Predictability: of 
1/32.46 = 3% [71]). In almost 38% of the cases the users 
visit new locations where they have never been. Further, a 
user in the dataset makes on average 2.04 check-ins at a loca¬ 
tion. The average check-ins for the remaining 62% locations 
is found to be 0.38 -I- a: * 0.62 = 2.04, x = 2.68. A mobility 
model needs at least one of the 2.68 locations for learning, 
thus a mobility model can at the most achieve an accuracy of 
1.68/2.68 = 63% for 62% of the check-ins, which corresponds 


to an Upper bound of Predictability of 0.63*0.62 = 39%. Fi¬ 
nally, we make use of Fano’s inequality ([23] as cited by [71]) 
for calculating the maximum predictability 0™““^ based on 
the entropy and the number of locations visited for each 
user. The average value of over all users is found to 

be less than 29%. Using the average entropy = 3.48 and 
the average number of locations visited by the users 62.35 
of all users, the average predictability increases to < 31%. 

The users move in almost 38% to new (not yet seen in 
their location histories) locations, three-quarter of these lo¬ 
cations are previously visited by friends, thus the amount 
of new locations reduces to 9.8% in the circle of friend of 
the users. Further, a check-in of a user to a location is fol¬ 
lowed in 13% of the cases by a check-in of a friend to that 
location within one hour. Furthermore, two-third of the ac¬ 
tive users are involved in social situations. We conclude from 
the aforementioned statistics a high potential of at least 10% 
for improving the prediction accuracy of the mobility model 
based on social influences. 

The spatial-temporal (ST) PPM model is able to predict 
the next location of a user with an accuracy of 18.6%. The 
accuracy using SOST PPM model increases to 21.2% and 
22.5% when estimating synchronous specific social influences 
according to Equation 16 and Equation 17 respectively. Es¬ 
timation according to Equation 17 gives social influences 
more importance than Equation 16, hence the better perfor¬ 
mance. The absolute improvements in accuracy corresponds 
to 0.026 and 0.039, the relative improvements in accuracy 
corresponds to 14% and 21% respectively. 

The drift functions increase the prediction accuracy of 
SOST PPM to 23.1%(/3 = 0.02) and 23.8%(,S = 0.05) using 
both estimators according to Equation 17 and Equation 16 
respectively. The improvements in accuracy correspond to 
0.006 and 0.026 absolute improvement, and to 2.7% and 
12% relative improvement in accuracy respectively. The 
significance of the improvement in accuracy is confirmed 
by two-sided unpaired t-tests with (P{e) = 2.5 * 10~®^ and 
P{e) = 0.02 respectively. The values of /3 imply that social 
influences decay in three to six weeks. 

Table 9 contains the cumulative improvements in accu¬ 
racy when incorporating an additional class of social influ¬ 
ence factors. The accuracy improves to 19.78%, 20.95% and 
22.04% by additionally incorporating class I, II and III syn¬ 
chronous specific social influence factors. The total absolute 
improvement in accuracy by incorporating all classes of syn¬ 
chronous specific social influence factors is 0.0344, the rel¬ 
ative improvement in accuracy corresponds to 18.5%. The 
significance of the improvements is confirmed by correspond¬ 
ing two-sided unpaired t-tests (Table 9). The empirical im¬ 
pressively underline the importance of social influence fac¬ 
tors for enhancing the accuracy of next location prediction. 

Almost one third of the users were not involved in any 
social situation, the incorporation of general (trends) social 
influences is the only possibility of enhancing their next lo¬ 
cation prediction. The improvement in accuracy increases to 
23.8% when additionally general social influences are incor¬ 
porated. The prediction accuracy increases by almost 0.0178 
due to the incorporation of social influences. The total abso¬ 
lute improvement in accuracy is 0.0522, which corresponds 
to a relative improvement in accuracy of 28%. The signif¬ 
icance of the improvement is confirmed by a two-sided un¬ 
paired Student’s t-test (P(e) = 2.0*10“^®). The importance 
of general social influences is emphasized by considering only 
users who are not involved in any social situation. The in- 




5-class 

Absolute 

Impr. %. 

Relative 
Impr. % 

Two-Sided 

Unpaired 

T-Test P(e) 

Class 1: 

0.0088 (0.0118) 

4.7 (6.1) 

0.0012 (0.00113) 

Class I & II: 

0.0235 (0.0313) 

12.6 (16.3) 

3.8 * 10“’^“ 

(1.4 * 10““’*) 

Class I-III 

0.0344 (0.0458) 

18.5 (23.9) 

1.4 * lO”*” 

(1.5 * 10“™) 


Tabl© 9: Empirical results: Column 2 represents the absolute im¬ 
provement in accuracy compared to ST PPM VOMM model, column 
3 represents the relative improvement in accuracy compared to ST 
PPM VOMM model, column 4 the results of two-sided unpaired t- 
tests (probability of error p(e)) for showing the significance of the 
improvements. The numbers in braces represent the corresponding 
values for the portion of users who are involved in at least one social 
situation (setting /3 — 0.05 and At — 1, a: — 3). 


corporation of general social influences leads to an absolute 
improvement in accuracy of 0.0221, which corresponds to a 
relative improvement in accnracy of 11%. 

Fano’s inequality [23] shows for the users in the Fonrsquare 
data-set an average predictability of 29%. SOST PPM is 
able to achieve a prediction accuracy of 23.8% from a max¬ 
imum predictability of 29%, which corresponds to an ac¬ 
curacy of at least 23.8/29 > 82%, which impressively un¬ 
derlines the prediction power of SOST PPM. The impact 
of social influences on improving location prediction can be 
convincingly demonstrated considering only users who were 
involved in only one social situation (The numbers in braces 
in Table 9), the improvement in accuracy increases to ap¬ 
proximately « 0.0615 which corresponds to a relative im¬ 
provement in accuracy of 32%. 

The users in the data-set visit in 437 231 cases (unknown) 
locations which are not yet been seen in their own location 
history. Integration of social networks resulted to a correct 
prediction of the location of users in 36 469 of these cases. 
The absolute improvement in total prediction accuracy is 
0.0319, which corresponds to 61% of the total absolute im¬ 
provement in accuracy. The significance of the improve¬ 
ment is conhrmed by a two-sided unpaired Student’s t-test 
{P{e) = 8* 10-“). 

The uncertainty in predicting the next location of a mobile 
user is the highest during evening hours of working days and 
on weekend days when they spend time with recreational 
activities. Figure 3 shows that the most improvement in 
accuracy due to social influences occurs during these time 
periods. The most improvement in accuracy on work days 
(blue bars) is achieved lunch and evening hours, has two 
peaks. The most improvements on weekend days (red bars) 
covers the hours between 11 a.m. and 12 p.m. These are the 
typical periods where people spend time with their friends, 
for example for having lunch, a drink after work, a window 
shopping stroll around the city. Figure 3 again underlines 
the importance of social influences for improving the pre¬ 
diction accuracy during time periods with high uncertainty, 
when the individual mobility model fails to find mobility 
patterns. 

5.3.1 Social Network Measurements 

The mobility of most of the users (followers) in the data¬ 
set is influence by a small subset of their friends (infiu- 
encers). Fewer than 1% of the users have more than 20 
potential influencers. The inclusion of location histories of 
only two friends is sufficient to achieve a significant relative 
improvement in accuracy of 21%. The relationship between 
the number of influencers and the improvement in accuracy 



Hours of Day 

■ Work Days ■ Weekend Days 

Figure 3: The proportion of improvement in accuracy over 
the hours of weekend days (red bars) and work days (blue 
bars). 


shows a strong positive trend, which is confirmed by a mod¬ 
erate positive correlation coefficient according to Pearson’s 
correlation coefficient (r = 0.37) and a strong positive corre¬ 
lation according to Spearman’s rank correlation coefficient 
(p = 49) with a probability of error of zero (e = 0.0) (Fig¬ 
ure 4). 
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# Influencers 

Figure 4: The relationship between the number of influ¬ 
encers (y-axis) and absolute improvement in accuracy (x- 
axis) (Setting At to one hour). 

The size of location histories injected by the influencers 
exhibits a similar trend as the number of influencers. The in¬ 
clusion of only 50 visits by influencers is sufficient to improve 
the accuracy by a significant 0.0266 absolute improvement 
and 14% relative improvement. The improvement in accu¬ 
racy shows a positive trend with the size of injected location 
histories of friends. The positive trend is conhrmed by a 
moderate positive correlation of 0.23 according to Pearson’s 
correlation coefficient, and a similar positive correlation of 
0.21 according to Spearman’s correlation coefficient with a 
probability of error of zero (e = 0.0). 

Outgoing, talkative, energetic behavior manifests the char- 
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Social Situation Rate 

Figure 5: Absolute accuracy improvement correlates with 
the average social situation rate r = 0.71, p = 0.61, P(e) = 
0 . 0 . 


acter of extraverted users, whereas reserved and solitary 
behavior manifest the character of introverted users [73]. 
Expectedly, extrovert users are involved in more social sit¬ 
uations compared to introvert users, hence, their mobility 
behavior is more predictable via social amendments to the 
spatial-temporal ST PPM approach. The relationship be¬ 
tween social situation rate and improvement in accuracy on 
Figure 5 shows a positive trend that confirms this behavior. 
The positive trend is underlined by a very strong correla¬ 
tion according to both Pearson’s 0.71 and Spearman’s 0.61 
correlation coefficients. 

The number of social situations that have been integrated 
in the SOST tree is 149 700. Almost 70% of these social 
situations are among members of the same 2-plexes, thus 
most social influences is transferred between members of the 
same cohesive subgroup, which is in accordance with the 
results of the correlation analysis in section 4.3.2. 



Figure 6: The relationship between the percentage of total 
absolute improvement in accuracy and the size of social situ¬ 
ations follows a power law with a coefficient of determination 
of 0.99. 

The size of a social situation is defined by the number of 


involved users. The size of most social situations varies be¬ 
tween two and five, and only a small portion are larger. The 
relationship between the size of social situations and the co¬ 
hesion according to Equation 5 follows power law, the higher 
the size of social situations, the lower the cohesion. The re¬ 
lationship between improvement in accuracy and both size 
and measure of cohesion of social situations follow pow lows 
with coefficients of determination of FI? > 99 (Figure 6) and 
I? > 95 (Figure 7) respectively. Improvement in accuracy 
shows a strong positive correlation with the measure of co¬ 
hesion in accordance with the correlation analysis in section 
4.3.2. 



Figure 7: The relationship between the percentage of total 
absolute improvement in accuracy and the average measure 
of cohesion in the social situations follows a power law with 
a coefficient of determination of 0.96. 

Humans have spatial, temporal, cognitive, emotional lim¬ 
itations, that prevents them from maintaining all their re¬ 
lationships with the same intensity [33]. Dunbar suggests 
the number of neighbors, with whom a user can maintain 
stable cognitive social relationships to be 150 ([19] as cited 
by [3]. Almost 9% of the users in the Foursquare dataset 
have more than 150 neighbors, and 22 of the users have at 
least 1000 neighbors, which means that these users have a 
lot of weak ties, because intuitively no one can cognitively 
maintain such a number of relationships. A user in a social 
network maintains strong ties to a small subset (2-plex) of 
their friends, most of whom are in touch with one another 
[33], and weak ties to the remaining friends, let us say ac¬ 
quaintances (in accordance to Granovetter). The acquain¬ 
tances in turn have their own subsets of strong ties, thus 
weak tie bridges the gap between different communities and 
social circles [32] and are important for transmitting general 
social trends beyond the borders of a cohesive subgroup. 
The information of users in a cohesive subgroup overlap to 
high degree due to the intensity of their interaction, which 
results in homogeneity in their behavior, life styles, emo¬ 
tional needs, thoughts, beliefs, movements, goals, etc. Two 
users connected via a weak tie exchange rather more novel 
information [33] because of the heterogeneity in their infor¬ 
mation. The heterogeneity occurs because each of the two 
users spends time and interacts with people, who the other 
user does not know [33]. 

Degree centrality is a notion that refers to the extent in 
which a user is connected to others. Glad well refers to Gen- 
































tral users with connectors, a few people who have the ex¬ 
traordinary knack of making friends and acquaintances and 
who can bring users from different social circles together 
([30] Pages 38-41 as cited by [5]). Central users have more 
friends than they cognitively can maintain strong relation¬ 
ship. Hence, most of their ties are rather weak ties that are 
enmeshed in different cohesive subgroups. Therefore, Cen¬ 
tral users are important, because they can bridge the gap 
between many different social communities and thus trans¬ 
mit social influence between these communities. 



Degree 


Figure 8: The average absolute improvement in accu¬ 
racy shows a negative trend as the degree increases, the 
correlation coefficients were found to be (r = —0.26, p = 
-0.29, P(e) = 0.0). 

The Foursquare dataset contains 147,900 social situations, 
almost > 70% of the social situations are between members 
of the same maximal 2-plexes, in the remaining 30% cases 
the users interact with their acquaintances. We enclosed 
the (central) users with more than 1 000 (many weak ties) 
in red ellipses in the following three figures. Figure 8 rep¬ 
resents the relationship between the average degree and the 
improvement in accuracy. The figure shows that the pre¬ 
diction accuracy of central users is only slightly improved 
using the location histories of their friends, because a cen¬ 
tral user interacts more with their weak ties from different 
social communities. Nevertheless, central users are impor¬ 
tant for transmitting social influence, because a check-in of 
a central user can potentially influence the mobility of 1,000 
neighbors. Central users are trend setters or trend transmit¬ 
ters between different social communities and are followed 
by rather than following others. 

9 shows the relationship between degree and average size 
of locations. The plot shows a positive trend r = 0.16, p = 
0.15, P(e) = 0.00016. The positive trend states that cen¬ 
tral users get about a lot, and are explorative in nature. 
They are trend setters and can influence the mobility of their 
neighbors. Figure 10 shows a positive trend between degree 
and average number of locations visited for the first time 
r = 0.33, p = 0.25, P(e) = 0.0 confirming the explorative 
nature of central users. The above results underline the im¬ 
portance of central users in a social network for transferring 
influence and their contribution to improve the prediction 
accuracy of their neighbors. 

5.3.2 Location History Measurements 



Degree 


Figure 9: A plot showing the positive trend between the 
degree and the average location history (r = 0.33, p = 
0.25, P(e) = 0.02). The size of the bubbles indicates to the 
number of users with a given degree. 



Degree 


Figure 10: A plot showing the positive correlation between 
the degree and the average number of locations visited for 
the first time (r = 0.24, p = 0.21, P(e) = 0.05). The size of 
the bubbles indicates to the number of users with a given 
degree. 


The performance of predictive models based on probabilis¬ 
tic reasoning depends to a high extent on the existence of 
sufficient location history. The inclusion of locations histo¬ 
ries of friends helps adding a vast amount of information not 
observed by the user, ft seems to be obvious that the inclu¬ 
sion of social networks improves prediction accuracy rather 
for users with insufficient (small) location histories. But, the 
relationship between improvement in accuracy and history 
size shows no tendency, because both entropy and number 
locations visited by the user increase as the size of location 
history increases. Therefore, social influences will always 
remain indispensable for enhancing prediction accuracy, re¬ 
gardless the size of location history of the user in question 
for location prediction. 

Users who are explorative in nature visit a lot of loca¬ 
tions with similar probabilities, thus their mobility is less 
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predictable. The users visits in 38% of the cases new lo¬ 
cations, but most of these locations are previously visited 
by their friends. Therefore, injecting location histories of 
friends into the individual mobility of a user can indeed in¬ 
crease the prediction accuracy. Figure 11 shows a positive 
trend between average number of locations visited by the dif¬ 
ferent users and the average improvement in accuracy. The 
positive trend is confirmed by a strong positive correlation 
coefficient according to Pearson r — 0.51 and Spearman’s 
p — 0.42,p(e) = 0. 
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Figure 11: The plot shows a positive correlation between 
the number of locations visited by each user and average 
absolute improvement in accuracy r = 0.51, p — 0.42, P(e) = 

0 . 0 . 

Average frequency of visit per location is a measure that 
affects the prediction accuracy. The mobility of users with 
low average frequency of visit per location is less predictable. 
The incorporation of location histories of friends into the in¬ 
dividual mobility model of a user helps increase prediction 
accuracy. The average frequency of visit per location in the 
Foursquare dataset is very low, each user makes on aver¬ 
age 2.04 check-ins per location. Figure 12 underlines the 
importance of social networks for increasing prediction ac¬ 
curacy, the relationship between average frequency of visit 
per location and average improvement in accuracy due to 
the integration of location histories of friends shows a strong 
negative trend. The negative trend is confirmed by a strong 
negative correlation coefficient of r = —0.34 according to 
Pearson, and a very strong negative correlation coefficient 
of p = —0.80, P(e) = 0 according to Spearman. 

The importance of social networks for location prediction 
is again underlined by investigating the relationship between 
entropy and improvement in accuracy due to the integration 
of location histories of friends into the individual mobility of 
a user. Entropy is a measure for the uncertainty associated 
with predicting the next location of a mobile user. The high 
average entropy value of 3.48 is an indicator for low mobility 
predictability in the Foursquare data-set. Figure 13 shows a 
strong positive trend between entropy and improvement in 
accuracy due the integration of social networks. The strong 
positive trend is conhrmed by very strong correlation coef¬ 
ficient values of r = 0.64 and p = 0.72, P(e) = 0.0 according 
to Pearson and Spearman respectively. 

Location entropy is a measure of predictability of loca¬ 
tions. Locations visited by many users with similar frequen- 
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Figure 12: The plot shows a negative correlation between 
the frequency of visit per location and average absolute im¬ 
provement in accuracy r = —0.34, p = —0.80, P(e) = 0.0. 
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Figure 13: Average absolute improvement in accuracy 
shows a positive trend with increasing entropy of the users. 
Both Pearson’s and Spearman’s correlation coefficients are 
found to be r = 0.64, p = 0.72, P{t) = 0.0 respectively. 


cies are highly entropic and their predictability is associ¬ 
ated with high uncertainty. Examples of highly entropic lo¬ 
cations are airports, sport stadiums, underground stations, 
etc. A restaurant visited frequently by neighboring residents 
and sporadically by visitors from elsewhere is an example of 
mediocre entropic locations. A private domicile of a user 
where friends come by occasionally is an example of low en¬ 
tropic locations. Figure 14 shows a strong positive trend be¬ 
tween location entropy and improvement in accuracy due to 
the integration of social networks. The strong positive trend 
is conhrmed by very strong correlation coefficient values of 
r = 0.57 according to Pearson, and p = 0.60, P(e) = 0.0 
according to Spearman. Integration of social network helps 
reduce the uncertainty associated with predicting high en¬ 
tropic locations. 

5.3.3 Mobility Models Based on Discrete HMMs 
The Hidden Markov Model (HMM) is probably one of 
most popular, well-studied and powerful DBN approach (??). 
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Figure 14: Average absolute improvement in accuracy 
shows a positive tendency with the increasing entropy of 
the users. Pearson’s and Spearman’s correlation coefficients 
were found to be r = 0.57, p = 0.60, P{e) = 0.0 respectively. 


Learning the exact parameters of an HMM and the complex¬ 
ity of the solution represents an intractable task, because as 
none of the known learning algorithms can find an exact so¬ 
lution, they tend to find a solution with local maxima [66]. 
Therefore, selection of the initial parameters of an HMM is of 
immense importance. The selection of proper initial model 
parameters help reduce convergence time and to increase 
the probability of convergence to a true solution. There¬ 
fore, HMM modeling requires in-depth understanding of the 
application domain. 

In [36], two different two states HMMs have been con¬ 
structed, one model with and the other without consider¬ 
ing influences from social networks. The accuracy of both 
models (16.4 & 16.6) is considerably lower than the corre¬ 
sponding PPM VOMM mobility models (18.6 & 23.8) in the 
previous section. The poor performance of HMM is due to 
the difficulty of learning the model parameters. The per¬ 
formance of HMM decreases as the the number of states in 
the model increases (for example to 15 & 15.2 when using 
five states), probably because of one of two reasons, either 
the data has been drawn from one distribution, or the loca¬ 
tion histories of the users are insufficient to learn the exact 
model parameters. We assume the second reason is rather 
probable, because observation size is known to be a critical 
issue for HMM [66, 13]. The availability of more location 
history results not necessarily increasing the performance of 
model, because as we have stated earlier, both entropy and 
the number of locations visited by a user increases as the 
history size increases. The increasing entropy and number 
of locations may lead either to a changed number of states in 
the model and thus make re-training the model indispens¬ 
able (which is another critical issue of HMM) or it leads 
to severe readjustment of the model parameters, especially 
when frequently occurring states are no longer relevant for 
a user (for example, behavioral changes due to marriage, 
moving or a new job, etc.). 

6. CONCLUSION 

The rapid technological advances of the last years, es¬ 
pecially the pervasiveness of mobile devices such as smart¬ 


phones, as well as the spread of mobile access to the Internet 
and the emergence of social networking platforms allow the 
collection of vast amounts of data containing information 
about the behavior of users, that facilitates the investigation 
of human mobility. In this work, using a data-set from an on¬ 
line location based social networking platform ’’Foursquare”, 
we investigated the existence of statistical interdependence 
between human mobility and social proximity, as well as the 
impact of social networks on influencing the mobility behav¬ 
ior of mobile users. 

The empirical results show indeed a strong interdepen¬ 
dence between social proximity and mobile homophily. An 
in depth correlation analysis between different social prox¬ 
imity measures such as common neighbors, Jaccard coeffi¬ 
cient, Adamic & Adair on one hand side, and different mo¬ 
bile proximity measurements such as co-location count, so¬ 
cial situation rate, spatial cosine similarity etc. on the other 
hand, has confirmed this interdependence. Further, using a 
influence model based on variable markov model, we have 
shown that impacts from the social network indeed cause 
changes in the mobility behavior of individual. We investi¬ 
gated the causation effect by improving location prediction 
of an individual by incorporating the location histories of 
their friends. The absolute improvement in accuracy was 
5.2%, the relative improvement even 28%. 

Privacy concerns are of great relevance regarding the ac¬ 
ceptance of users towards location based (social network) 
services LB(SN)S. A service gains a higher acceptance if 
users have the choice to opt-in or opt-out of a service, if they 
know who has access to their location information and with 
whom they share their information and for how long [62]. 
The empirical results has shown that the influence of move¬ 
ments of friends approaches zero after three to six weeks. 
Further, the inclusion of locations histories of few friends 
is sufficient for enhancing location prediction signihcantly. 
This finding is important for increasing user acceptance to¬ 
wards location prediction when they know, that they need 
to share their location histories with a small subset of their 
close friends for a comparably limited duration less than six 
weeks. 

The empirical results has shown that the mobility of an in¬ 
dividual is influences mostly by the members of the same co¬ 
hesive subgroups and that cohesion inside the groups shows 
a very strong correlation with improvement in accuracy. The 
members of the same cohesive subgroup are responsible for 
transferring social influence/trends inside the groups. The 
members of a cohesive subgroup exhibit similarities in their 
goals, believes, information, emotional needs, interests etc. 
resulting in a high similarity in their information and be¬ 
havior. The users in a social network have more ties than 
the ties to the members of their cohesive subgroups. In¬ 
formation exchange (co-locations, social situations) between 
two users connected via a weak tie are responsible for the 
transmission of more novel information between different co¬ 
hesive subgroups, because each of the two users are members 
in different cohesive subgroups, thus their information differ 
from each other. A central user in a social network (degree 
centrality) have a lot of weak ties, thus they are very im¬ 
portant for the spread of social influence between different 
social communities and setting new social trends. 

Location prediction can be used in a variety of services 
such as optimizing fuel consumption and reduction of co2 
emission in vehicles [26, 22, 45], increasing driving efficiency 
& safety [83], increasing the performance of high-voltage 









battery pack in hybrid electric vehicles (HEV) ([63] as cited 
by [42]), mobile marketing and intelligent mobile advertis¬ 
ing [11, 12], saving energy in private households [34], getting 
a head for the demand curve and being more proactive in 
deploying aid and rescue capabilities during disaster relief 
scenarios ([27, 29] as cited by [28]), rehabilitation or crime 
suppression using electronic monitoring, healthcare moni¬ 
toring systems [40], spread of human and electronic viruses, 
city planning, resource management in mobile communi¬ 
cations ]71], traffic management and public transport rec- 
ommender systems [65], presence prediction for face-to-face 
meeting, emergency call or intelligent postal services [42], 
supporting assisting technologies for disabled or cognitively 
impaired persons (Alzheimer’s) [61]. 
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